Transcriptome analysis reveals cell cycle-related transcripts as key determinants of varietal differences in seed size of Brassica juncea

Brassica juncea is an important oilseed crop, widely grown as a source of edible oil. Seed size is a pivotal agricultural trait in oilseed Brassicas. However, the regulatory mechanisms underlying seed size determination are poorly understood. To elucidate the transcriptional dynamics involved in the determination of seed size in B. juncea, we performed a comparative transcriptomic analysis using developing seeds of two varieties, small-seeded Early Heera2 (EH2) and bold-seeded Pusajaikisan (PJK), at three distinct stages (15, 30 and 45 days after pollination). We detected 112,550 transcripts, of which 27,186 and 19,522 were differentially expressed in the intra-variety comparisons and inter-variety comparisons, respectively. Functional analysis using pathway, gene ontology, and transcription factor enrichment revealed that cell cycle- and cell division-related transcripts stay upregulated during later stages of seed development in the bold-seeded variety but are downregulated at the same stage in the small-seeded variety, indicating that an extended period of cell proliferation in the later stages increased seed weight in PJK as compared to EH2. Further, k-means clustering and candidate genes-based analyses unravelled candidates for employing in seed size improvement of B. juncea. In addition, candidates involved in determining seed coat color, oil content, and other seed traits were also identified.

). However, one replicate of 30D PJK sample was later removed due to an extremely high percentage of duplicates. Therefore, further analysis was carried out with 17 libraries. All the libraries showed a high correlation coefficient (spearman coefficient > 0.9) between biological replicates (Supplementary Table 2). A total of 137,912 unique transcripts were identified using StringTie. From these, 112,550 transcripts with normalized expression value of transcripts per million (TPM) > 0 were analyzed further. (Supplementary Table 3). Of these, 77,875 were previously annotated in B. juncea, while 34,675 were novel transcripts. Further, out of total 112,550 expressed transcripts, 95,471 were expressed in both the varieties, while 7590 and 9489 were detected exclusively in EH2 and PJK, respectively (Fig. 1c). The analysis of transcript abundance showed that ~ 50% of transcripts were detected at TPM > 1 in each of the stages in both the varieties (Fig. 1d). Further, 469 transcripts with TPM > 1 in EH2 but no expression in PJK were referred as EH2-specific while 638 transcripts with TPM > 1 in PJK but no expression in EH2 were identified as PJK-specific (Supplementary Table 4). Some of these transcripts expressed at very high levels with two of the EH2-specific and PJK-specific transcripts having TPM > 100, whereas a set of 41 and 37 EH2-and PJK-specific transcripts, respectively had average TPM > 10. Further, we performed intra-and inter-variety comparisons to identify differentially expressed transcripts at all three stages of seed development.
Intra-variety stage-wise differential expression analysis. Pairwise differential expression analysis by measuring ≥ two-fold change at q value < 0.05 revealed differentially expressed transcripts at different stages of seed development. A total of 20,514 and 18,901 unique transcripts were differentially expressed in EH2 and PJK, respectively (Supplementary Table 5 and 6). In EH2, 10,118 transcripts were differentially expressed in the middle vs. early stage, of which 4598 and 5520 transcripts were upregulated and downregulated, respectively (Fig. 2a). While in PJK, 8427 transcripts were differentially expressed in the same comparison, and 3676 and 4751 transcripts were upregulated and downregulated, respectively. In the late vs. middle stage, 8006 transcripts were differentially expressed in EH2, of which 5221 and 2785 transcripts were upregulated and downregulated, respectively while 7667 transcripts were differentially expressed in PJK out of which 3209 and 4458 transcripts were upregulated and downregulated, respectively. In late vs. early-stage samples, 16,474 transcripts were differentially expressed in EH2 with 10,106 and 6368 transcripts upregulated and downregulated, respectively. In same comparison, 15,988 transcripts were differentially expressed in PJK, of which 7411 and 8577 transcripts were upregulated and downregulated, respectively. Thus, maximum differential expression was observed in late vs. early stage EH2 and PJK.
Inter-variety stage-wise differential expression analysis. In the inter-variety comparison between EH2 and PJK, 19,522 unique transcripts were differentially expressed during seed development (Supplementary Table 7) (Fig. 2b). In the early stage, 11,851 transcripts were differentially expressed in the PJK vs. EH2 comparison, of which 5292 and 6559 transcripts were upregulated and downregulated, respectively. In the middle stage, 8889 transcripts were differentially expressed, of which 3869 and 5020 transcripts were upregulated and downregulated, respectively. In contrast, 10,280 transcripts were differentially expressed in the late stage, of which 3125   Fig. 3a). The information about the key genes with known roles in seed traits such as embryo development, seed size, TAG biosynthesis, glucosinolate content, seed coat color, etc. was retrieved from various databases like TAIR  www.nature.com/scientificreports/ (https:// www. arabi dopsis. org), ARALIP (http:// aralip. plant biolo gy. msu. edu) BRAD (http:// brass icadb. cn/#/), and SeedGenes (https:// seedg enes. org/ index. html) and mapped onto differentially expressed gene sets (Supplementary Table 11). The key findings are elaborated below.
Functional analysis of transcripts exhibiting highest expression at the early stage. We observed that 1718 and 4065 transcripts were upregulated at an early stage in EH2 and PJK, compared with the middle and late stages, respectively (Fig. 3a). The key functions assigned to these genes using pathway mapping were cell wall cellulose synthesis, modification, and degradation; signaling receptor kinases; development (SPL); hormone metabolism like jasmonate and ethylene; protein degradation; transport, etc. (Supplementary Table 8) (Fig. 3b). The top GO categories enriched in this gene set were: cell wall, external encapsulating structure, endomembrane system, plasma membrane, apoplast, etc. (Supplementary Table 9). Further, TF fold enrichment at q value < 0.05 showed that transcripts encoding M-type MADS, SRS, ARR-B, ERF, and SBPs TFs were over-represented at the early stage (Supplementary Table 10) (Fig. 3c). Further, a total of 56 transcripts upregulated in the early-stage mapped to 38 Arabidopsis genes involved in the regulation of glucosinolate (30.4%), fatty acid elongation, oil content and TAG synthesis (25%), embryo development and endosperm cellularization (16.1%), seed germination and dormancy (16.1%), seed size (5.4%), seed development (3.5%), seed coat color (1.8%), and seed storage (1.8%) (Supplementary Table 11).
Functional analysis of transcripts exhibiting highest expression at the middle stage. In the middle stage, 2469 and 1437 transcripts were commonly upregulated in both the varieties compared to early and late stages, respectively ( Fig. 3a). They showed enrichment in Mapman pathway sub-categories like brassinosteroid metabolism, hormone transport, signaling, glycolysis, secondary metabolism, etc. (Fig. 3b). GO categories like cell wall, endomembrane system, lipid particle, lipid storage, etc., were also enriched in this dataset (Supplementary Table 9). Further, TFs belonging to B3, BES1, HD-ZIP, LBD, and MYB were overrepresented at the middle stage (Fig. 3c).
Functional analysis of transcripts exhibiting highest expression at the late stage. In the late stage, 3905 and 1290 transcripts were upregulated as compared to early and middle stages, respectively (Fig. 3a). These were enriched in pathways like light reaction, flavonoid metabolism, development (LEA), etc. (Fig. 3b). GO categories related to photosystem, phenylpropanoid metabolism, secondary metabolites, etc., were enriched in transcripts upregulated at the late stage (Supplementary Table 9). In contrast, TFs like bHLH, WOX, and YABBY were enriched (Fig. 3c). The mapping of 166 transcripts upregulated in late-stage to candidate genes proffered transcripts related to fatty acid elongation and synthesis, oil synthesis and TAG synthesis (56%), seed germination and dormancy (15.1%), embryo development (14.5%), glucosinolate (4.8%), seed coat color, mucilage, and seed coat structure (4.2%), seed development (1.8%), seed maturation (0.6%), seed size (1.8%), and seed storage (0.6%) (Supplementary Table 11). Apart from the stage-specific pathways, several pathways like regulation of transcription, auxin and gibberellin metabolism, development, cytochrome P450, etc., were enriched at all three stages. Similarly, DOF and TCP TFs were enriched in all the stages of seed development. In contrast, ZF-HD and GRFs were enriched in the middle and late stages.

Functional analysis of inter-variety differentially expressed transcripts.
To identify the critical determinants governing the variety-specific differences concerning seed weight, we first analyzed the transcripts upregulated at each stage in a variety-specific manner (Fig. 4) (Supplementary Tables 5 and 6). Among the transcripts upregulated in EH2 only, the topmost enriched pathways at the early stage were biotic stress, protein synthesis, S-assimilation, cell wall modification, protease inhibitor, etc. Those involved in DNA synthesis, cell cycle, cell division, cell wall degradation, auxin, etc. were enriched at both early and middle stages, while cytochrome P450 genes were enriched at early, middle, and late-stages. Similarly, transcripts involved in secondary metabolism, peptide transport, etc. were enriched at the early and late stage. Those involved in regulation of transcription, development-storage proteins, transport-sugars, ATP-binding cassette (ABC) transporters, lipid metabolism, plastocyanin, etc. were enriched at the middle stage while transcripts regulating development and minor carbohydrates metabolism were enriched in middle and late-stage. Further, transcripts regulating abiotic stress, development-LEA, flavonoid metabolism, brassinosteroid, and ethylene metabolism, etc. were enriched at the late-stage. The enriched GO categories also reflected the same pattern with cell cycle and DNA replication among the top categories in the early and middle stages (Supplementary Table 9). On the contrary, among the transcripts upregulated in PJK only, the topmost enriched pathways belonged to categories S-assimilation, signaling receptor kinases, ethylene, and abscisic acid (ABA) metabolism, mitochondrial electron transport, transport sugars, etc. at the early stage while those involved in peptide and protein transport were enriched at early and middle stages. Transcripts associated with biotic stress and cytochrome P450 were enriched in early, middle, and late-stages, while those associated with transport-major intrinsic proteins, cell wall degradation, regulation of transcription, development were enriched at early and late stages. Similarly, gibberellin metabolism, gluconeogenesis, calcium signaling, etc.-associated transcripts were enriched at the middle stage and those associated with light reaction, Calvin cycle, flavonoid metabolism, etc. were enriched at the middle and late- www.nature.com/scientificreports/ stages. The transcripts involved in protein synthesis, lipid metabolism, tetrapyrrole synthesis, cell wall pectin esterases and secondary metabolism were enriched at late-stage. Interestingly, cell division and cycle-related pathways were enriched among the transcripts upregulated at both early and middle stages in EH2 only, while their expression goes down in the late stage of seed development (Fig. 4). Since cell proliferation is a significant determinant of seed size regulation, this observation indicates that the cell division and cell cycle transcripts might play a key role in regulating the seed size difference among these varieties. Therefore, we shortlisted the transcripts belonging to cell division and cell cycle pathway categories which were uniquely upregulated at 15D and 30D compared to 45D in EH2 only (Supplementary Table 12). We obtained a total of 136 transcripts. While these transcripts showed a reduction in expression at the 45D stage in EH2, most of these did not show the reduction in the 45D stage in PJK (Fig. 5a). This expression pattern correlates with the differences observed in seed weight in these varieties, as the continuous assessment of seed weight from 10D to the mature stage showed that the seed weight in both the varieties does not exhibit much difference in the early stages, and the maximum seed weight difference is observed at the later stage of seed development, i.e., from 30 to 45D (Fig. 5b). Identification of the orthologous genes from Arabidopsis revealed that 45 of these transcripts were cyclins, 12 were cell division control (CDC) transcripts, 7 were Cyclophilins, 4 were Anaphase Promoting Complex (APC) transcripts, 3 were Cyclin-dependent kinases (CDK), and the rest were miscellaneous  Table 12).
To further dissect the cell cycle and division-related transcripts governing variety-specific differences in seed weight, we investigated the transcripts upregulated at the late stage in PJK compared to EH2. A total of 10,280 transcripts were differentially expressed in PJK vs. EH2 at 45D, of which 7155 transcripts were upregulated in PJK (Fig. 2b). Of these, 102 transcripts were related to cell cycle and cell division, as shown by pathway mapping (Supplementary Table 14). Further, we identified 446 phytohormones and TF-related transcripts upregulated at 45D in PJK vs. EH2. (Supplementary Table 14). Of these 548 transcripts, 11 were expressed in a PJK-specific manner (A01_VARUNA_g4163.t1, A03_VARUNA_g1726.t1, A03_VARUNA_g820.t1, B02_VARUNA_g6466. . Analysis of seed weight (mg) in EH2 and PJK from 10 days after pollination till maturity. Fresh weight is shown from 10 to 45D, and dry weight is shown at maturity. An average weight of twenty seeds was used, and three replicates were used. Standard deviation is also shown. EH2 and PJK exhibit similar seed weight till 15D, and maximum seed weight difference can be observed 30D to 45D. (c) Heatmaps showing expression patterns of transcripts mapped to eight phytohormones based on MapMan pathway mapping. The transcripts upregulated in the early and middle stages only in EH2 are shown. The boxes demarcate the transcripts, which show a decrease in expression in E45 but not in P45, indicating candidates for phytohormones transcripts responsible for continued cell proliferation in PJK. Orthology-based identification of candidate genes for regulating seed size, coat color and oil content in B. juncea. We identified candidate genes regulating seed weight from the literature to further shortlist the candidates regulating variety-specific differences in seed size. Based on the knowledge from Arabidopsis, several pathways, including transcriptional regulators, phytohormones, signaling, and other players, are well elucidated for their roles in governing seed size 32 . We identified such candidate genes and observed that 79 transcripts, orthologous to 29 Arabidopsis genes, were differentially expressed between EH2 and PJK (Supplementary Table 15). This comparison unravelled that orthologous transcripts of at least eight positive regulators  (Fig. 7). Further, we also leveraged the information available in TAIR, ARALIP, SEEDGENES, and BRAD databases for Arabidopsis gene functions of genes, to identify candidate genes regulating seed coat color and oil content (Supplementary Table 16). EH2 is a yellow-seeded variety with ~ 36.9% oil content, while PJK is a brown seeded variety with ~ 41.1% oil content 28 (Fig. 1). Orthologs of at least 14 seed coat color-related and 17 oil contentrelated Arabidopsis genes showed differential expression in EH2 and PJK indicating their potential involvement in regulating contrasting phenotypes (Table 1). We also analysed the highest expressing transcripts in each of the stages, in both the varieties, to determine if any of these are interesting candidates based on inter-varietal differential expression. For this, we shortlisted the top 100 expressing transcripts in 15D, 30D, and 45D stages of both EH2 and PJK. Among these, we obtained 11, 7, 37 highest expressing transcripts in 15D, 30D, and 45D respectively which showed significant upregulation in EH2 as compared to PJK at the respective stages, and 12, 18, and 22 transcripts that were highly expressed in 15D, 30D, and 45D in PJK and also significantly upregulated in PJK as compared to EH2 at the same stage (Supplementary Table 17). Based on overlaps with the pathway mapping and candidate genes results, we observed an interesting finding that of the 37 of these highest expressing transcripts upregulated in 45D in EH2, 10 were orthologs of Arabidopsis oleosin genes involved in oil biosynthesis (Supplementary Table 16 and 17), indicating that these may be good candidates for further dissecting the differences in lipid metabolism in EH2 and PJK.

Discussion
Identification of conserved hallmarks regulating temporal stages of seed development in Brassica juncea. We utilized two approaches to dissect the global transcriptional dynamics during the early, middle, and late stages of seed development in B. juncea. In the first approach, we identified commonly upregulated transcripts in both the varieties at different stages of development, and dissected functional categories and candidates important for governing early, middle, and late seed development in B. juncea. Several key transcripts known as hallmarks of embryo development and endosperm cellularization, seed maturation and filling, and desiccation and dormancy were detected at early, middle and late stages of seed development, respectively.
The functional categories enriched among transcripts upregulated at an early stage highlighted cell wallrelated transcripts. It is well evidenced that cell wall loosening and weakening are crucial for the growth of developing embryos 33 . Several MADS and MYB TFs, enriched in the early stage in our data, are also known to regulate embryo development in Arabidopsis 34 . One of the most well-known candidates was MINISEED3 (MINI3), which encodes a WRKY10 transcriptional activator and acts as a pivotal regulator of embryo development and cell size by regulating cell division and elongation 35 . It is also recently shown in B. napus as a potential regulator of CK responses in developing endosperm 36 . Some of the other important candidates were homologs of AMINO ACID PERMEASES1 (AAP1), ARABIDOPSIS FORMIN HOMOLOGY5 (AtFH5), (RETARDED GROWTH OF EMBRYO1 (RGE1), etc. AAP1 imports amino acids in the embryos and affects the content of carbon, nitrogen, and seed storage compounds 37 . AtFH5 codes for a formin protein, crucial for maintaining actin organization and cytokinesis, and is important for normal endosperm cellularization 38 . RGE1 codes for a bHLH TF, which regulates embryo development 39 .
The pathway categories enriched in the middle stage also conformed to the physiological changes during the middle stage of seed development. For example, brassinosteroids (BRs) regulate embryo and endosperm development 40   www.nature.com/scientificreports/ content of oleic acid and proteins in the oil 42,43 . FUS3 expresses during embryo maturation and is a significant regulator of seed filling during seed maturation 44 . Seed coat-related transcripts were also prominent in transcripts upregulated at the middle stage, for example, TRANSPARENT TESTA4 (TT4), TT10, TT16, TRANSPARENT TESTA GLABRA2 (TTG2), etc. TT10 codes for a laccase-like enzyme involved in the polymerization of flavonoids and regulates the browning of the seed coat 45 . TTG2 encodes a WRKY TF that controls the deposition of tannins in the seed coat 46 . The late-stage-upregulated transcripts were mainly related to seed dormancy and desiccation-related functions. For example, LEA proteins are crucial for desiccation and seed longevity in mature seeds 47 . bHLH TFs are known to participate in the regulation of DELAY OF GERMINATION1 (DOG1), which regulates seed dormancy 48 . Eight transcripts orthologous to DOG1 showed upregulation among the dormancy-related transcripts at the late stage. DOG1 is a significant regulator of seed dormancy and longevity, as shown in several species like Arabidopsis, wheat, lettuce, etc. It regulates dormancy in response to various endogenous and environmental factors by affecting multiple germination-related processes like activation of storage compounds, weakening seed coats, cell elongation, etc. 49 . These observations not only confirm that we could successfully capture the transcriptional repertoire regulating landmarks stages of seed development in B. juncea but also highlights high conservation in the key regulators modules regulating seed development in Arabidopsis and Brassica.
An extended period of cell proliferation is a possible cause of increased seed size in the bold-seeded variety. In the second approach, we used the transcripts upregulated explicitly in each of the varieties at different stages, and identified the crucial candidates governing variety-specific differences in seed size. We identified cell cycle and cell division-related transcripts as significant determinants of seed size. These were orthologous to genes encoding cyclins and CDKs, the regulatory and catalytic subunits of the protein kinases that regulate the cell cycle progression. For example, CDCs are crucial for DNA replication initiation while APCs are essential for cell cycle phase transition 50 . The decrease in expression of these transcripts in latestage in EH2, but not PJK, and the upregulation of several cell cycle-related transcripts in the late stage of PJK compared to EH2 indicates that cell division and cell cycle-related genes stay upregulated during later stages in PJK. At the same time, they are downregulated in EH2, suggesting that an extended period of cell proliferation is likely responsible for increased seed weight in PJK compared to EH2. Figure 7. Identification of candidate genes responsible for variety-specific differences in seed weight in Brassica juncea. Candidate genes regulating seed size/weight were identified from the literature. Seed size is mainly regulated by genes regulating cell proliferation or expansion in the seed coat, endosperm, or embryo. Those genes which showed differential expression between PJK and EH2 are shown here. Positive regulators are shown in blue and negative regulators are shown in red. Heatmaps are shown for orthologous transcripts in B. juncea transcriptome, only for the transcripts differentially expressed at early, middle, or late stages of seed development. Color coding for heatmap: white-not differentially expressed, green-upregulated in EH2, purple-upregulated in PJK. Yellow stars mark the transcripts that are good candidates responsible for varietyspecific seed size differences based on our transcriptome analysis. www.nature.com/scientificreports/ Interestingly, a similar finding has been recently made in B. rapa 27 . Based on the differential expression analysis between the seeds of high and low seed size cultivars, Niu et al. 2020 27 concluded that cell division and cell cycle are the significant determinants of seed size in B. rapa. They also suggested that the duration of cell cycle-specific gene expression at different seed development stages may be instrumental in determining seed size differences 27 . Further, Li et al. 2015 21 have also demonstrated that seed weight exhibits a higher correlation with cell number rather than the cell size of cotyledons and seed coats. These observations also hold up in B. napus 18 , indicating that cell number rather than cell size is the primary determinant of seed weight in Brassica. In another dicot, chickpea, an extended period of cell proliferation has been determined to contribute to cultivar-specific differences in seed weight 51 .
Since phytohormones are crucial regulators of the cell cycle, and auxins and cytokinins especially are classically known as positive regulators of cell proliferation, we also identified the phytohormone-related transcripts which may be responsible for regulating the cell cycle-related transcription 52,53 . Coexpression analysis suggested that TFs like E2Fs and phytohormones including BRs, CKS, GAs and auxins contribute to differential levels of cell cycle and cell division in developing seeds. E2Fs are well known for regulating the cell cycle during seed development 50 and were represented in 6 of the 8 clusters. E2F and B3 TFs were also found to coexpress with cell cycle genes in B. napus and were identified as possible hub genes for regulating cell proliferation in developing seeds 27 . BRs regulate cell division positively, and BRI1 mutants exhibit a decrease in seed size. Hence, BRI1 is a valuable candidate for transcripts clustered in group 1. GASA genes express in response to GAs, and GASA10 is previously known to regulate cell elongation positively 54 . Therefore, GASA10 may also be a useful candidate for regulating cell cycle-related transcripts. CKXs are cytokinin dehydrogenases regulating cell proliferation 55 and are thus, other potential candidates in the clusters. In B. napus, the cytokinin oxidase/dehydrogenase (CKX) gene family has been predicted to be a good candidate for manipulating seed size based on the transcriptome analysis of developing seeds and other tissues 56 .
Furthermore, candidate gene-based analysis identified orthologs of at least eight genes (BRI1, DET2, EOD3 MINI3, CKX2, WRI1, ANT, MKK5) which may be involved in regulating seed size differences in EH2 and PJK (Fig. 7). Among these, BRI1, DET2, EOD3, ANT, and MKK5 mainly regulate cell proliferation in the seed coat. MINI3 and CKX2 mainly control cell division in the endosperm, and WRI1 is primarily involved in regulating cell division in embryos. On the other hand, the negative regulator ARF2 mainly restricts cell division in the seed

S. No
Bju Transcript ID www.nature.com/scientificreports/ coat 57 . BRI1 and DET2 are related to BR signaling-mediated control of seed size. BRI1 mutants are BR insensitive and defective in BR reception. DET2 regulates cell size and cell number in embryos and cell size in integuments positively 58 . In our data, both orthologs of DET exhibited upregulation in PJK in the early stage. EOD3 codes for CytochromeP450/CYP78A6 and acts as an enhancer of DA1-1. It positively regulates the cell size of the integument 59 . ANT codes for an AP2 (APETALA2) TF and promotes cell proliferation in the integuments 60 . It has also been shown to act downstream of TF MEE45 (MATERNAL EFFECT EMBRYO ARREST45). ANT positively regulates the auxin signaling and promotes cell proliferation in integuments and possibly embryo 61 .
Only two of the Brassica orthologs of ANT showed upregulation in EH2 in the middle stage. Two orthologs showed upregulation in PJK in the late stage, and three orthologs showed upregulation in the early stage (Supplementary Table 15). MKK5 is a positive maternal regulator of seed size 62 , and three out of four orthologous transcripts showed upregulation in PJK in all three stages of seed development. MINI3, a WRKY TF, expressed in both endosperm and embryo, is a positive regulator of seed size. It regulates endosperm size and thus seed size. Mutants of MINI3 exhibit reduced seed size 63,64 . Two orthologs of MINI3 exhibited upregulation in PJK in the early stage, indicating its possible involvement in seed size regulation. CKX2 degrades cytokinin and thus reduces cytokinin signaling in the micropylar endosperm and controls endosperm size via the IKU2 pathway. Reduction of cytokinin activity through the IKU pathway is required for the active growth of endosperm. Hence, CKX2 is also a positive regulator of endosperm and seed size 65 . Of the three orthologs showing differential expression in our data, one showed upregulation in all three stages of seed development in PJK (Supplementary Table 15). Further, overexpression of Arabidopsis WRI1 enhances seed weight in Camelina sativa, primarily through cell expansion in embryos 66 . We observed upregulation of WRI1 ortholog in the middle stage in PJK. ARF2 is a negative regulator of seed size, repressing cell proliferation in the seed coat through mediating auxin signaling 67 , and was upregulated in EH2 and downregulated in PJK at all three stages of seed development (Supplementary  Table 15). Thus, these transcripts are good candidates for investigating variety-specific differences in seed size in B. juncea. ARF2 and ANT are the most interesting, as ARF2 represses ANT by binding to its promoter, and ANT positively regulates cell proliferation 68,69 . In our study, ARF2 is upregulated in EH2, while ANT is upregulated in PJK. Hence, this opposite expression pattern makes this ARF2-ANT cascade a strong candidate for future validation for investigating the molecular basis of seed size differences in these varieties.

Identification of candidates from East European and Indian gene pools of Brassica juncea
for improving seed quality. Utilizing the differential expression analysis between the species, we also shortlisted candidates to regulate seed coat color, oil content, and other traits ( Table 1, Supplementary Table 16). One of the seed coat color genes, TT8, has been previously shown in B. juncea to exhibit perfect co-segregation with the seed coat color phenotype 70 . In our study, TT8 ortholog B03_VARUNA_g5220.t1 showed upregulation in the brown-seeded variety (PJK) compared to the yellow-seeded variety (EH2), conforming to the expected pattern. In B. napus, TT8 has also been demonstrated to block phenylpropanoid synthesis, contributing to yellow seed color 71 . Similarly, orthologs of BAN (BANYULS), a late biosynthetic gene for flavonoid synthesis, were upregulated in all stages of PJK as compared to EH2. Downregulation of flavonoid biosynthesis-related genes like BAN, TT8, TT4, TT5, etc., have also been reported in yellow-seeded B. napus 25 . Moreover, various candidates related to TAG biosynthesis and metabolism were differentially expressed between EH2 and PJK. For instance, all four of the pivotal regulators of seed maturation and seed filling, LEC1, ABI3, FUS3, and LEC2 (LAFL) genes, were present among the differentially expressed transcripts in the PJK vs. EH2 comparison, and some were upregulated in EH2, while some were upregulated in PJK. These transcripts are suitable candidates for enhancing oil quality and content by contributing favorable alleles from both the varieties. Furthermore, upregulation of oil biosynthesis candidates like DGAT1, DGAT2, FAD2, LEC2 in PJK suggests that PJK can serve as a source of trait-enhancing alleles for not just seed size, but also for oil content. Previous studies have well-established that Indian gene pools have been selected for bigger seeds and higher oil content 14 . Hence, detailed transcriptional dynamics reported here can be further used to validate candidates for marker-assisted breeding for improving both oil content and seed size simultaneously using PJK as one of the parental lines.
Conclusions and future perspectives. We identified at least 136 cell cycle and cell division related transcripts like cyclins, CDCs, 14 Cyclophilins, APC genes, CDKs, and other miscellaneous cell cycle-related genes that exhibit a decrease in the expression in later stages of seed development in the low seed weight variety EH2, but not in the high seed weight variety PJK. Further, we identified the candidate TFs (E2F, MYB, B3, etc.) and phytohormone-related transcripts (112 transcripts), which might act as upstream regulators of these cell cyclerelated transcripts. Furthermore, eight previously known seed size-related genes (BRI1, DET2, EOD3 MINI3, CKX2, WRI1, ANT, and MKK5) were identified as the known candidates for explaining the variety-specific seed size differences. We also identified seed coat color and oil content-related transcripts differentially expressed between EH2 and PJK that can be utilized to improve oil quality. EH2 and PJK belong to distinct Indian and East European gene pools, respectively 2 . These gene pools are heterotic, and therefore have significant application in hybrid breeding in B. juncea 72 . The QTLs for seed size, coat color, oil content, and other seed traits have been previously reported in B. juncea 2,11,28,73 . However, marker-assisted breeding for improving these traits still lacks behind due to the scarcity of tightly linked markers and precise candidates. Hence, the candidates identified here (Table 1, Supplementary Tables 12 and 16) shall help address this lacuna. They can be prioritized for further validation and in crop improvement programs through genome editing or marker-assisted breeding to improve seed size and other seed-related economically important agronomic characters.  74 . Seeds were harvested at three developmental stages, i.e., 15, 30, and 45 days after pollination (D), immediately frozen in liquid nitrogen and stored at − 80 °C for RNA isolation. Seed weight was measured using five biological replicates of 20 seed each, and the average seed weight was calculated in mg. Thousand seed weight of mature seeds was calculated as described earlier 2 .
RNA isolation and sequencing. Total RNAs were isolated using the Spectrum ™ Plant Total RNA kit (Sigma-Aldrich) as per manufacturer's guidelines. Genomic DNA contamination was removed from the samples using the Turbo DNase kit (Ambion) following the manufacturer's instructions. RNA quality was checked using nanodrop, agarose gel electrophoresis, and Bioanalyzer. A total of 18 seeds samples with RIN > 7 were selected to prepare libraries using the Illumina Truseq™ RNA Sample Prep kit as per the manufacturer's instructions. Pairedend sequencing was performed using Illumina HiSeq 2500 with an average read length of 100 bp.
Analysis of RNA sequencing data. Quality filtering at Q30 and adaptor trimming were carried out using FastP 75 . High-quality reads were mapped to the genome assembly of B. juncea variety Varuna 31 using HISAT2 76 . Further, the aligned reads were assembled using StringTie v2.1.7 77 and expression was estimated as counts using featureCounts v.2.2.0 with default parameters 78 . The gene expression level was quantified in TPM 79 (transcripts per million). The correlation coefficient among biological replicates was calculated using Spearman's correlation coefficient using the R package. Pairwise differential expression analysis was performed between different developmental stages/ varieties using the R package DESeq2 v.1.32.0 78 . The significance of differential gene expression was considered by the cut-off threshold of absolute log2 fold change ≥ 1 (upregulated transcripts) or ≤ − 1 (downregulated transcripts) and Benjamini-Hochberg adjusted P-values < 0.05 (false discovery rate).
Functional analyses of transcriptomic data. Arabidopsis orthologs of Brassica transcripts were identified using BLASTx with TAIR protein (https:// www. arabi dopsis. org/ downl oad/ index-auto. jsp? dir=/ downl oad_ files/ Prote ins) sequences at an e-value of e < 10 -3 . Pathway mapping and enrichment analysis were carried out using MapMan (https:// mapman. gabipd. org/ mapman) as described earlier 80 . A p-value cut-off of ≤ 0.05 was used for enrichment analysis, followed by Benjamini Hochberg correction with a q-value cut-off of ≤ 0.05. Gene ontology analysis was performed using AgriGO (http:// syste msbio logy. cau. edu. cn/ agriG Ov2/c_ SEA. php). GO enrichment analysis was performed using a p-value cut of ≤ 0.05 followed by Hypergeometric test with the Bonferroni analysis method. Transcription factors were downloaded from Plant Transcription Factor Database (PTFDB v5.0) (http:// plant tfdb. gao-lab. org/). Heatmaps were generated using the MeV tool 81 . For identification of expressed transcripts, k-means clustering was performed using MeV. K-means clustering was performed to obtain 8 clusters using Pearson correlation distance metric with maximum number of iterations set to 1000. To identify candidate genes for seed-related traits, the functional_descriptions related to "seed" traits were downloaded from TAIR (https:// www. arabi dopsis. org/ index. jsp) database. ARALIP (http:// aralip. plant biolo gy. msu. edu/ pathw ays/ pathw ays) was used to identify triacylglycerol synthesis-related genes. For extracting glucosinolate biosynthesis-related genes, BRAD (http:// brass icadb. cn/#/) database was used. The use of plants in the present study complies with international, national, and/or institutional guidelines.

Data availability
The transcriptome datasets generated in this study are available in NCBI Sequence Read Archive (SRA) for the Bioproject: PRJNA824648 (https:// datav iew. ncbi. nlm. nih. gov/ object/ PRJNA 824648? revie wer= c9bo4 j34sn iee91 l288j g4eh73). Additional datasets supporting this study are included in the paper and in the supplementary files. www.nature.com/scientificreports/